HOME | ABOUT ME | LAB | RESEARCH | TEACHING

Bioinformatics Data Skills

Utah Valley University - BIOL490R (Special Topics)

Course Syllabus

Course file repository

Shared Course Notes

(Anyone with link can edit)


Table of Contents

Week 1 | Week 5 | Week 9 | Week 13

Week 2 | Week 6 | Week 10 | Week 14

Week 3 | Week 7 | Week 11 | Week 15

Week 4 | Week 8 | Week 12 | Week 16


Command Line Projects and the Unix Philosophy

Week 1

Ideology of ‘Robust and Reproducible’ Bioinformatics

Topics:

  • What are “data skills?” | Reproducibility and open science | How to learn bioinformatics | Documentation | The importance of caution

Assignments:

  • Read through BDS Chapter 1… twice, and carefully
  • Find and explore the supplemental materials for the chapter on GitHub
  • Assignment 1 - Reflection piece on why you want to learn command line skills and best practices
  • Set up your computer environment (Command-line, Git)

Resources

For your consideration:

  • “Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.” –Brian Kernighan
  • “Since the computer is a sharp enough tool to be really useful, you can cut yourself on it.” – John Tukey

Back to top of page


Week 2

Proper Project Organization

Topics:

  • One directory per project | data as ‘read-only’ | rules for naming things | project structure | documentation

Assignments:

  • Read through BDS Chapter 2 at least once
  • Work through BDS Chapter 2, following along in your own terminal (data used is in online repository)
  • Assignment 2 - Create oganized project template using code

Resources

Practice

  • Re-create your project directory template by copy-pasting each line of code from your assignment to make sure it gives the same result
  • Spend time making sure that you intuitively understand relative filepaths and get comfy with the terminal
  • Spend 2-3 hours mucking about in your terminal reworking the lines from Chapter 2 over and over until it feels normal

Back to top of page

Unix refresher and sequence data types

Week 3

Remedial Unix Shell

Topics:

Assignments:

  • Work through BDS Chapter 3
  • Assignment 3 - Use pipes and redirects

Resources

Practice

Back to top of page


Week 4

Working with Sequence Data

Topics

Assignments:

  • Work through BDS Chapter 10
  • Assignment 4 - Trim reads, Count nucleotides, convert from fastq to fasta

Resources

Practice

Back to top of page

Using Existing Tools in the Command Line

Week 5

Combining Unix Skills and Command-Line Software

Topics:

Assignments:

  • Case study 1 - Run ITSxpress on fungal data
    • keep complex results
    • customize output for question at hand
    • make it reproducible
    • do it on a hundred files
    • store all log data into one file - for how many sequences TOTAL were no ITS stop or start sites identified?
    • push “workflow.txt” (not data)
    • uses chapters: 2,3,10 (for+loop, grep, redirect 2>, flags)

Resources

Practice

Back to top of page

More Powerful Unix Tools

Week 6

Unix Data Tools

Topics:

Assignments:

  • Work through BDS Chapter 7

Resources

Practice

Back to top of page


Week 7

Unix Data Tools, Continued

Topics:

Assignments:

  • Continue working through BDS Chapter 7
  • Assignment 5 - build tabular file from fasta database

Resources

Practice

Back to top of page

Finding and Retrieving Data

Week 8

Online Repositories and Approaches to Downloading

Topics:

Assignments:

  • Work through BDS Chapter 6
  • Assignment 6 - download stuff with ftp, curl, Edirect, sra-toolkit
  • Case Study 2 - Reproducibly downloading stuff (BDS p. 120)

    • Full documentation
    • Checksums
    • Markdown README

Resources

Practice

Back to top of page

Working with Supercomputers

Week 9

Interfacing with Remote Machines

Topics:

Assignments:

  • Work through BDS Chapter 4 before class this week
  • Assignment 7 - build 3 separate SLURM scripts to run fasta analyses

Resources

Practice

Back to top of page


Week 10

Interfacing with Remote Machines, Continued

Topics:

Assignments:

Resources

Practice

Back to top of page

Version Control and Collaborations

Week 11

Git for Scientists

Topics:

  • Git workflow
  • GitHub
  • Collaborating with Git

Assignments:

  • Work through BDS Chapter 5
  • Assignment 8 - Git collaboration and merge
  • Group effort: Everyone (in turn) make changes to this repository

Resources

Practice

Back to top of page


Week 12

Bioinformatics Shell Scripting

Topics:

Assignments:

  • Work through BDS Chapter 12
  • Assignment 9 - Git collaboration and merge-

Resources

Practice

  • In-class collaborative name list

Back to top of page

Putting it all together

Week 13

Composing Full Pipelines

Topics:

Assignments:

  • Continue working through BDS Chapter 12

Resources

Practice

Back to top of page


Week 14

Running a Pipeline on a Remote Machine

Topics:

Assignments:

  • Case Study 3 - Assemble a metagenome on the remote cluster

    • metaSPADEs
    • classify reads with DIAMOND?

Resources

Practice

Back to top of page


Week 15

Creating a Custom Bioinformatics Tool

Topics:

  • Testing with toy examples

Assignments:

  • Case Study 4 - Download NCBI marker genes and use Unix tools to build a custom RDP-Classifier-compatible reference database

    • Reingineer https://github.com/gzahn/Format_NCBI_QIIME
    • Edirect (command-line version of NCBI search tool)
    • ftp, BLAST, NCBI, data cleaning and reformatting
    • Turn into a completely reproducible and portable script
    • requires entrez_qiime.py installation and use
    • has to be well-documented
    • push tool to GitHub
    • uses chapters: 2,3,6,7,10,12,5
    • script should automate download and building with helpful messages along the way

Back to top of page


Week 16

Where to go from here?

Topics:

Assignments:

  • Assignment 10 - Reflection piece on what you’ve learned and what next steps you’ll take

Back to top of page